• Linear Regression: Standard linear regression models a relationship between a dependent variable (y) and an independent variable (x) as a straight line:

y = β₀ + β₁x

Where:

β₀ is the intercept.

β₁ is the slope.

  • Introducing the Quadratic Term: Quadratic regression extends linear regression by adding a squared term of the independent variable (x²):

y = β₀ + β₁x + β₂x²

Where:

β₂ is the coefficient of the squared term.

The Curve:

The x² term introduces a curve into the relationship.

If β₂ is positive, the curve opens upward (like a U).

If β₂ is negative, the curve opens downward (like an inverted U).

1 Sheet 1

1.1 What is the relationship between population and IGF revenue performance patterns?

# Descriptive statistics
Cleaned_TMA_Data %>% skim(Population)
Data summary
Name Piped data
Number of rows 9
Number of columns 76
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Population 0 1 262939.9 81660.38 174370 177924 311206 330800 351628 ▇▁▁▂▇
Cleaned_TMA_Data %>% skim(IGF)
Data summary
Name Piped data
Number of rows 9
Number of columns 76
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
IGF 0 1 21963737 4371810 13748337 19752424 23144607 24450113 28142151 ▂▂▅▇▅
# Histograms
ggplot(Cleaned_TMA_Data, aes(x = Population)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Population", x = "Population", y = "Frequency") +
  scale_x_continuous(labels = comma)

ggplot(Cleaned_TMA_Data, aes(x = IGF)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of IGF Revenue", x = "IGF Revenue", y = "Frequency") +
  scale_x_continuous(labels = comma)

# Growth Rate (Percentage)
Cleaned_TMA_Data <- Cleaned_TMA_Data %>%
  mutate(
    Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
    IGF_Growth_Rate = c(NA, diff(IGF) / IGF[-length(IGF)] * 100)
  )

# Plot of Trends

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population)) +
  geom_point(aes(y = Population), color = "dodgerblue") +
  labs(title = "Population Trend", x = "Year", y = "Population") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = IGF)) +
  geom_point(aes(y = IGF), color = "dodgerblue") +
  labs(title = "IGF Trend", x = "Year", y = "IGF") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = IGF, color = "IGF")) +
    geom_point(aes(y = IGF, color = "IGF")) +
  labs(title = "Population vs. IGF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
  scale_y_continuous(labels = comma)

# Growth rate plots
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
    geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
  geom_line(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
    geom_point(aes(y = IGF_Growth_Rate, color = "IGF Growth")) +
  labs(title = "Population Growth vs. IGF Growth", x = "Year", y = "Growth Rate (%)", color = "Type") +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") # Add horizontal line at zero

The histograms show an uneven distribution of population and IGF revenue. The population had the highest around 450,000. The trends plots show clear that the trend of IGF Revenue ( which experienced significant changes) is not directly linked to the trend of Population( which has a stable rise).

1.1.1 Regression Analysis

mod1 <- lm(IGF ~ Population, data = Cleaned_TMA_Data)
summary(mod1)
## 
## Call:
## lm(formula = IGF ~ Population, data = Cleaned_TMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -7570834 -1776081   145785  2087103  7221596 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)   
## (Intercept) 25475147.92  5368698.90   4.745   0.0021 **
## Population       -13.35       19.60  -0.682   0.5175   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4526000 on 7 degrees of freedom
## Multiple R-squared:  0.06222,    Adjusted R-squared:  -0.07175 
## F-statistic: 0.4645 on 1 and 7 DF,  p-value: 0.5175
Cleaned_TMA_Data %>%
  ggplot(aes(x = Population, y = IGF)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) + 
  labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and IGF Revenue") + 
  scale_y_continuous(labels = scales::comma)

# The Quadratic Term
Cleaned_TMA_Data$Population_Squared <- Cleaned_TMA_Data$Population^2

#  Quadratic Regression
mod_quad <- lm(IGF ~ Population + Population_Squared, data = Cleaned_TMA_Data)

summary(mod_quad)
## 
## Call:
## lm(formula = IGF ~ Population + Population_Squared, data = Cleaned_TMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4731775 -3941607   302186  2508018  6084357 
## 
## Coefficients:
##                            Estimate       Std. Error t value Pr(>|t|)
## (Intercept)        82622023.3281143 54289197.1474813   1.522    0.179
## Population             -503.5459820      463.8418536  -1.086    0.319
## Population_Squared        0.0009558        0.0009036   1.058    0.331
## 
## Residual standard error: 4488000 on 6 degrees of freedom
## Multiple R-squared:  0.2096, Adjusted R-squared:  -0.05386 
## F-statistic: 0.7956 on 2 and 6 DF,  p-value: 0.4938
ggplot(Cleaned_TMA_Data, aes(x = Population, y = IGF)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) + # Use formula for quadratic
  labs(x = "Population", y = "IGF Revenue (Ghana Cedis)", title = "Quadratic Relationship between Population and IGF Revenue") +
  scale_y_continuous(labels = comma)

Linear Regression:

Coefficients:

Intercept: 25475147.92

Population: -13.35 . For each unit increase in population, IGF is predicted to decrease by approximately -13.35 Ghana Cedis but the linear regression results is non-significant .

P-values: Intercept: 0.0021 (significant)

Population: 0.5175 (insignificant)

R-squared: Multiple R-squared: 0.0622

Adjusted R-squared:-0.07175

Interpretation:

The linear model shows a very weak and statistically insignificant relationship between population and IGF revenue.
Population explains as high as 6.22% of the variance in IGF.

Quadratic Regression:

Coefficients: Intercept: 82622023.3281143

Population: -503.5459820

Population_Squared: 0.0009558

P-values: All coefficients are statistically insignificant (p > 0.01). But the overall model is also statistically insignificant ( p-value = 0.4938).

R-squared: Multiple R-squared: 0.2096

Adjusted R-squared: -0.05386

Interpretation:

The quadratic model shows a statistically insignificant relationship between population and IGF revenue. The insignificant quadratic terms confirm that though the relationship is not linear a quadratic relationship is hard to capture their relationship.

The R-squared of 0.2096 indicates that the quadratic model explains 20.96% of the variance in IGF, a little improvement of the linear model but since it is non-significant.

  • Transformations
# Transformed Model
lm(Ln_IGF ~ Ln_Pop, data = Cleaned_TMA_Data) %>% summary()
## 
## Call:
## lm(formula = Ln_IGF ~ Ln_Pop, data = Cleaned_TMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40856 -0.06193  0.00984  0.12375  0.32499 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  19.2207     2.9458   6.525 0.000326 ***
## Ln_Pop       -0.1878     0.2369  -0.793 0.453806    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2208 on 7 degrees of freedom
## Multiple R-squared:  0.08243,    Adjusted R-squared:  -0.04865 
## F-statistic: 0.6289 on 1 and 7 DF,  p-value: 0.4538
# Scatter Plots (Transformed Data)
ggplot(Cleaned_TMA_Data, aes(x = Ln_Pop, y = Ln_IGF)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Log(Population) vs. Log(IGF Revenue)", x = "Log(Population)", y = "Log(IGF Revenue)")

#GAM
 gam(IGF ~ s(Population, k = 9) + Ln_Tt_Revenue + CollRate_Fees, data = Cleaned_TMA_Data) %>% summary()
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## IGF ~ s(Population, k = 9) + Ln_Tt_Revenue + CollRate_Fees
## 
## Parametric coefficients:
##                 Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   -294996765   71285289  -4.138  0.00901 **
## Ln_Tt_Revenue   18303341    3941990   4.643  0.00562 **
## CollRate_Fees     -12855      31056  -0.414  0.69609   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##               edf Ref.df     F p-value
## s(Population)   1      1 2.884    0.15
## 
## R-sq.(adj) =  0.911   Deviance explained = 94.5%
## GCV = 3.0482e+12  Scale est. = 1.6935e+12  n = 9

After the log transformation the result is still insignificant ( p-value: 0.4538 and Multiple R-squared: 0.08243)

Even with the flexible Generalized Additive Model (GAM) the smooth term for population was not statistically significant (p = 0.15.

  • Checking Regression Assumptions
# Scatter Plot

ggplot(Cleaned_TMA_Data, aes(x = Population, y = IGF)) +
  geom_point() +
  labs(title = "Population vs. IGF Revenue", x = "Population", y = "IGF Revenue")

# Residual
ggplot(data = data.frame(residuals = residuals(mod1), fitted = fitted(mod1)), aes(x = fitted, y = residuals)) +
  geom_point() + # Added geom_point()
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Linear) ", x = "Fitted Values", y = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod1)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals(Linear)", x = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod1)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals")

shapiro.test(resid(mod1))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod1)
## W = 0.98987, p-value = 0.9959
#  Residuals vs. Fitted Values
ggplot(data = data.frame(residuals = residuals(mod_quad), fitted = fitted(mod_quad)), 
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted (Quadratic Model)", x = "Fitted Values", y = "Residuals")

#  Histogram of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals (Quadratic Model)", x = "Residuals")

shapiro.test(resid(mod_quad))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod_quad)
## W = 0.93329, p-value = 0.5132
#  Q-Q Plot of Residuals
ggplot(data = data.frame(residuals = residuals(mod_quad)), aes(sample = residuals)) +
  geom_point(stat = "qq") +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals (Quadratic Model)")

#  Durbin-Watson Test (Autocorrelation)
dwtest(mod1)
## 
##  Durbin-Watson test
## 
## data:  mod1
## DW = 1.3918, p-value = 0.08374
## alternative hypothesis: true autocorrelation is greater than 0
dwtest(mod_quad)
## 
##  Durbin-Watson test
## 
## data:  mod_quad
## DW = 1.8379, p-value = 0.1227
## alternative hypothesis: true autocorrelation is greater than 0
#  Breusch-Pagan Test (Homoscedasticity)
bptest(mod1)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod1
## BP = 1.271, df = 1, p-value = 0.2596
bptest(mod_quad)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod_quad
## BP = 2.3299, df = 2, p-value = 0.3119
#  Variance Inflation Factor (VIF) - Multicollinearity
bptest(mod1)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod1
## BP = 1.271, df = 1, p-value = 0.2596
vif(mod_quad)
##         Population Population_Squared 
##            569.833            569.833

For the linear model all the assumptions are met but for the quadratic model Multicollinearity is present and a trend or can be found in the residuals this means the quadratic model violated 2 assumptions.

Therefore from the analysis so far we found that there is statistically insignificant negative and non-linear relationship between population and IGF revenue, though the linear model did not violate any assumption. The relationship is can not be captured by the linear and other transformations even the GAM. The scatter plots indicated two clusters in the population and IGF.

1.2 What is the relationship between population and DACF revenue performance patterns?

Cleaned_TMA_Data %>% skim(Population)
Data summary
Name Piped data
Number of rows 9
Number of columns 79
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Population 0 1 262939.9 81660.38 174370 177924 311206 330800 351628 ▇▁▁▂▇
Cleaned_TMA_Data %>% skim(DACF)
Data summary
Name Piped data
Number of rows 9
Number of columns 79
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
DACF 0 1 2030075 470625.9 1326091 1761596 2037955 2254656 2863934 ▂▇▅▅▂
# Histograms
ggplot(Cleaned_TMA_Data, aes(x = Population)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Population", x = "Population")

ggplot(Cleaned_TMA_Data, aes(x = DACF)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of DACF Revenue", x = "DACF Revenue")

#Growth Rates and Per Capita Values
Cleaned_TMA_Data <- Cleaned_TMA_Data %>%
  mutate(
    Population_Growth_Rate = c(NA, diff(Population) / Population[-length(Population)] * 100),
    DACF_Growth_Rate = c(NA, diff(DACF) / DACF[-length(DACF)] * 100)
  )




# Plotting Trends


ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population)) +
  geom_point(aes(y = Population), color = "dodgerblue") +
  labs(title = "Population Trend", x = "Year", y = "Population") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = DACF)) +
  geom_point(aes(y = DACF), color = "dodgerblue") +
  labs(title = "DACF Trend", x = "Year", y = "IGF") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = DACF, color = "DACF")) +
  geom_point(aes(y = DACF, color = "DACF")) +
  labs(title = "Population vs. DACF Revenue", x = "Year", y = "Amount/Population", color = "Type") +
  scale_y_continuous(labels = scales::comma)

# Plotting Growth Rates
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population_Growth_Rate, color = "Population Growth")) +
  geom_point(aes(y = Population_Growth_Rate, color = "Population Growth")) +
  geom_line(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
  geom_point(aes(y = DACF_Growth_Rate, color = "DACF Growth")) +
  labs(title = "Population Growth vs. DACF Growth", x = "Year", y = "Growth Rate (%)", color = "Type")+
  geom_hline(yintercept = 0, linetype = "dashed", color = "red")

The histograms show an uneven distribution of population and DACF revenue. The trends plots show clear that the trend of DACF Revenue ( which experienced significant changes) moves in the opposite direction of the Population( which had a stable rise) are not directly linked.

1.2.1 Regression Analysis

mod2 <- lm(DACF ~ Population, data = Cleaned_TMA_Data)
summary(mod2)
## 
## Call:
## lm(formula = DACF ~ Population, data = Cleaned_TMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -610654 -419869  139098  336566  664509 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)   
## (Intercept) 2538511.876  562212.239   4.515  0.00275 **
## Population       -1.934       2.052  -0.942  0.37740   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 474000 on 7 degrees of freedom
## Multiple R-squared:  0.1126, Adjusted R-squared:  -0.0142 
## F-statistic: 0.888 on 1 and 7 DF,  p-value: 0.3774
Cleaned_TMA_Data %>%
  ggplot(aes(x = Population, y = DACF)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) + # Added confidence intervals
  labs(x = "Population", y = "DACF Revenue (Ghana Cedis)", title = "Linear Relationship between Population and DACF Revenue") +
  scale_y_continuous(labels = scales::comma)

 lm(DACF ~ Population + Population_Squared, data = Cleaned_TMA_Data) %>% summary()
## 
## Call:
## lm(formula = DACF ~ Population + Population_Squared, data = Cleaned_TMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -540806 -444574  128239  375051  645015 
## 
## Coefficients:
##                            Estimate       Std. Error t value Pr(>|t|)
## (Intercept)        4467282.59590058 6141838.00509749   0.727    0.494
## Population             -18.47817181      52.47529295  -0.352    0.737
## Population_Squared       0.00003226       0.00010222   0.316    0.763
## 
## Residual standard error: 507700 on 6 degrees of freedom
## Multiple R-squared:  0.1271, Adjusted R-squared:  -0.1639 
## F-statistic: 0.4367 on 2 and 6 DF,  p-value: 0.6652

There is a statistically insignificant negative relationship between population and DACF revenue performance patterns. As population increases, DACF tends to decrease. Population explains only 11.26% of the variance in DACF. The quadratuc model too is not significant.

  • Checking Regression Assumptions
 #Scatter Plot 
ggplot(Cleaned_TMA_Data, aes(x = Population, y = DACF)) +
  geom_point() +
  labs(title = "Population vs. DACF Revenue",
       x = "Population", y = "DACF Revenue")

#  Residual 
ggplot(data = data.frame(residuals = residuals(mod2),
                        fitted = fitted(mod2)),
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs. Fitted",
       x = "Fitted Values", y = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod2)),
       aes(x = residuals)) +
  geom_histogram(bins = 10, fill = "skyblue", color = "black") +
  labs(title = "Histogram of Residuals", x = "Residuals")

ggplot(data = data.frame(residuals = residuals(mod2)),
       aes(sample = residuals)) +
  stat_qq() +
  stat_qq_line() +
  labs(title = "Q-Q Plot of Residuals ")

shapiro.test(resid(mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod2)
## W = 0.93748, p-value = 0.5556
# Autocorrelation
dwtest(mod2)
## 
##  Durbin-Watson test
## 
## data:  mod2
## DW = 2.8719, p-value = 0.8767
## alternative hypothesis: true autocorrelation is greater than 0
# Homoscedasticity (Constant Variance of Residuals)

bptest(mod2)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod2
## BP = 2.9822, df = 1, p-value = 0.08419
# Multicollinearity
#simple linear regression with one predictor(population), multicollinearity is not an issue.


# Multivariate Normality

#It is a simple linear regression with one predictor(population), multicollinearity therefore this is not an issue.

The scatter plot shows two the presence of two clusters and non-linear relationship. It shows that as population increases DACF revenue tends to decrease as well. The histogram plot show a potential violation of the normality assumption though the test could not detect it. The Durbin-Watson test revealed no autocorrelation, and the Breusch-Pagan test shows homoscedasticity.

  • Transformation
#Transformed Models
log_mod2 <- lm(log(DACF) ~ log(Population), data = Cleaned_TMA_Data)
summary(log_mod2 )
# 
# Call:
# lm(formula = log(DACF) ~ log(Population), data = Cleaned_TMA_Data)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -0.35558 -0.18516  0.08728  0.18003  0.29081 
# 
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)    
# (Intercept)      17.1780     3.1619   5.433 0.000974 ***
# log(Population)  -0.2154     0.2542  -0.847 0.424838    
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 0.237 on 7 degrees of freedom
# Multiple R-squared:  0.09302, Adjusted R-squared:  -0.03655 
# F-statistic: 0.7179 on 1 and 7 DF,  p-value: 0.4248
sqrt_mod2 <- lm( sqrt(DACF)~sqrt(Population), data = Cleaned_TMA_Data )  
summary(sqrt_mod2)
# 
# Call:
# lm(formula = sqrt(DACF) ~ sqrt(Population), data = Cleaned_TMA_Data)
# 
# Residuals:
#     Min      1Q  Median      3Q     Max 
# -232.01 -139.20   55.13  123.50  219.16 
# 
# Coefficients:
#                   Estimate Std. Error t value Pr(>|t|)   
# (Intercept)      1742.8430   369.6621   4.715  0.00217 **
# sqrt(Population)   -0.6440     0.7209  -0.893  0.40134   
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 166.5 on 7 degrees of freedom
# Multiple R-squared:  0.1023,  Adjusted R-squared:  -0.0259 
# F-statistic: 0.7981 on 1 and 7 DF,  p-value: 0.4013
#  Scatter Plots (Transformed Data)
ggplot(Cleaned_TMA_Data, aes(x = log(Population), y = log(DACF))) +
  geom_point() +
  geom_smooth(method = "lm")+
  labs(title = "Log(Population) vs. Log(DACF Revenue)",
       x = "Log(Population)", y = "Log(DACF Revenue)")

ggplot(Cleaned_TMA_Data, aes(x = log(Population), y = log(DACF))) +
  geom_point() +
  geom_smooth(method = "lm")+
  labs(title = "Sqrt(Population) vs. Sqrt(DACF Revenue)",
       x = "Sqrt(Population)", y = "Sqrt(DACF Revenue)")

Both the log-log and square root transformations are statistically not significant and neither improved the model fit compared to the linear model.

# Function to perform diagnostic tests and plots
perform_diagnostics <- function(model, model_name) {
  # Residuals vs. Fitted
  plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
                 aes(x = fitted, y = residuals)) +
    geom_point() +
    geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
    labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")

  # Histogram of Residuals
  plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
    geom_histogram(bins = 10, fill = "skyblue", color = "black") +
    labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")

  # Q-Q Plot of Residuals
  plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
    geom_point(stat = "qq") +
    stat_qq_line() +
    labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))

  # Durbin-Watson Test
  dw_test <- dwtest(model)
  print(paste("Durbin-Watson Test (", model_name, "):"))
  print(dw_test)

  # Breusch-Pagan Test
  bp_test <- bptest(model)
  print(paste("Breusch-Pagan Test (", model_name, "):"))
  print(bp_test)

  # Print VIF (if applicable)
  if (length(coef(model)) > 2) { # Check for multiple predictors
    vif_result <- vif(model)
    print(paste("VIF (", model_name, "):"))
    print(vif_result)
  }

  # Arrange plots
  grid.arrange(plot1, plot2, plot3, nrow = 1)
}

# Perform diagnostics for each model
perform_diagnostics(mod2, "Linear Model")
## [1] "Durbin-Watson Test ( Linear Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 2.8719, p-value = 0.8767
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Linear Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 2.9822, df = 1, p-value = 0.08419

perform_diagnostics(log_mod2, "Log-Log Model")
## [1] "Durbin-Watson Test ( Log-Log Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 2.6567, p-value = 0.7738
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Log-Log Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.87598, df = 1, p-value = 0.3493

perform_diagnostics(sqrt_mod2, "Square Root Model")
## [1] "Durbin-Watson Test ( Square Root Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 2.7683, p-value = 0.831
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Square Root Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 1.8828, df = 1, p-value = 0.17

shapiro.test(resid(mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(mod2)
## W = 0.93748, p-value = 0.5556
shapiro.test(resid(log_mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(log_mod2)
## W = 0.93697, p-value = 0.5504
shapiro.test(resid(sqrt_mod2))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(sqrt_mod2)
## W = 0.9401, p-value = 0.5829

The diagnostic tests indicate that all the three models satisfy the assumptions tests but the plots show slight violation of the normality assumption.

Therefore, from the regression analysis results all three models appear to fail to capture the relationship. The transformations did not solve the change the situation either. The dataset might be a hindrance therefore the relationship remains unclear given this this data.

1.3 What is the relationship between population, recurerent and capital expenditure?

  • Descriptive Statistics
# Calculate descriptive statistics
desc_stats <- Cleaned_TMA_Data %>%
  summarize(
    Population_mean = mean(Population),
    Population_sd = sd(Population),
    Population_min = min(Population),
    Population_max = max(Population),
    Capital_Expenditure_mean = mean(Capital_Expenditure),
    Capital_Expenditure_sd = sd(Capital_Expenditure),
    Capital_Expenditure_min = min(Capital_Expenditure),
    Capital_Expenditure_max = max(Capital_Expenditure),
    Recrrent_Expenditure_mean = mean(Recrrent_Expenditure),
    Recrrent_Expenditure_sd = sd(Recrrent_Expenditure),
    Recrrent_Expenditure_min = min(Recrrent_Expenditure),
    Recrrent_Expenditure_max = max(Recrrent_Expenditure)
  )


cat("
## Descriptive Statistics

| Statistic               | Population | Capital Expenditure | Recurrent Expenditure |
|------------------------|------------|---------------------|-----------------------|
| Mean                   |", format(desc_stats$Population_mean, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_mean, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_mean, big.mark = ",", digits = 2), "|
| Standard Deviation     |", format(desc_stats$Population_sd, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_sd, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_sd, big.mark = ",", digits = 2), "|
| Minimum                |", format(desc_stats$Population_min, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_min, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_min, big.mark = ",", digits = 2), "|
| Maximum                |", format(desc_stats$Population_max, big.mark = ",", digits = 2),
  "|", format(desc_stats$Capital_Expenditure_max, big.mark = ",", digits = 2),
  "|", format(desc_stats$Recrrent_Expenditure_max, big.mark = ",", digits = 2), "|
\n")
## 
## ## Descriptive Statistics
## 
## | Statistic               | Population | Capital Expenditure | Recurrent Expenditure |
## |------------------------|------------|---------------------|-----------------------|
## | Mean                   | 262,940 | 8,430,470 | 15,377,435 |
## | Standard Deviation     | 81,660 | 3,635,231 | 5,408,861 |
## | Minimum                | 174,370 | 4,724,892 | 5,140,669 |
## | Maximum                | 351,628 | 16,210,705 | 24,388,461 |
# Capital Expenditure Histogram
cap_hist <- ggplot(Cleaned_TMA_Data, aes(x = Capital_Expenditure)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "skyblue", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of Capital Expenditure", x = "Capital Expenditure (Ghana Cedis)", y = "Density") +
  scale_x_continuous(labels = comma) 

# Recurrent Expenditure Histogram
rec_hist <- ggplot(Cleaned_TMA_Data, aes(x = Recrrent_Expenditure)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "lightgreen", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of Recurrent Expenditure", x = "Recurrent Expenditure (Ghana Cedis)", y = "Density") +
  scale_x_continuous(labels = comma) 

# Population Histogram
pop_hist <- ggplot(Cleaned_TMA_Data, aes(x = Population)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "dodgerblue", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribution of Population", x = "Population", y = "Density") +
  scale_x_continuous(labels = comma) 

cap_hist

rec_hist

pop_hist

  • Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population)) +
  geom_point(aes(y = Population), color = "dodgerblue") +
  labs(title = "Population Trend", x = "Year", y = "Population") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  labs(title = " Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
  theme(axis.title.y.right = element_text(vjust=2))

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_line(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  geom_point(aes(y = Rec_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  labs(title = "Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma)

# Calculate Per Capita Values
Cleaned_TMA_Data$Capital_Exp_Per_Capita <- Cleaned_TMA_Data$Capital_Expenditure / Cleaned_TMA_Data$Population

# Plotting Trends 
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  labs(title = "Population and Capital Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
  scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
  theme(axis.title.y.right = element_text(vjust=2))

# Per Capita Analysis 
average_capita <- mean(Cleaned_TMA_Data$Capital_Exp_Per_Capita)

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_hline(yintercept = average_capita, linetype = "dashed", color = "red")+
  labs(title = "Capital Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma) 

Cleaned_TMA_Data$Recrrent_Exp_Per_Capita <- Cleaned_TMA_Data$Recrrent_Expenditure / Cleaned_TMA_Data$Population
average_rec_capita <- mean(Cleaned_TMA_Data$Recrrent_Exp_Per_Capita)

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  geom_point(aes(y = Recrrent_Exp_Per_Capita, color = "Recrrent Exp. Per Capita")) +
  geom_hline(yintercept = average_rec_capita, linetype = "dashed", color = "red") +
  labs(title = "Recurrent Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Population, color = "Population")) +
  geom_point(aes(y = Population, color = "Population")) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_line(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  geom_point(aes(y = Recrrent_Expenditure, color = "Recurrent Expenditure")) +
  labs(title = "Population and Expenditure Trends", x = "Year", y = "Amount", color = "Type") +
  scale_y_continuous(labels = comma, sec.axis = sec_axis(~., name = "Population")) +
  theme(axis.title.y.right = element_text(vjust=2))

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Exp. Per Capita")) +
  geom_line(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  geom_point(aes(y = Recrrent_Exp_Per_Capita, color = "Recurrent Exp. Per Capita")) +
  labs(title = "Expenditure Per Capita Over Time", x = "Year", y = "Ghana Cedis Per Capita", color = "Type") +
  scale_y_continuous(labels = comma)

1.3.1 Regression Results

mod3 <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population, data = Cleaned_TMA_Data)
summary(mod3)
## Response Capital_Expenditure :
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_TMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3867539 -2695139 -1036731  2815735  7032439 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5532959.18 4466411.60   1.239    0.255
## Population       11.02      16.30   0.676    0.521
## 
## Residual standard error: 3765000 on 7 degrees of freedom
## Multiple R-squared:  0.06128,    Adjusted R-squared:  -0.07283 
## F-statistic: 0.4569 on 1 and 7 DF,  p-value: 0.5208
## 
## 
## Response Recrrent_Expenditure :
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_TMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -10603509  -1652861   -983931   2407651   8417481 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)  
## (Intercept) 13379529.869  6813765.355   1.964   0.0903 .
## Population         7.598       24.870   0.306   0.7689  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5744000 on 7 degrees of freedom
## Multiple R-squared:  0.01316,    Adjusted R-squared:  -0.1278 
## F-statistic: 0.09335 on 1 and 7 DF,  p-value: 0.7689
mod_cap <- lm(Capital_Expenditure ~ Population, data = Cleaned_TMA_Data)
summary(mod_cap)
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population, data = Cleaned_TMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3867539 -2695139 -1036731  2815735  7032439 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5532959.18 4466411.60   1.239    0.255
## Population       11.02      16.30   0.676    0.521
## 
## Residual standard error: 3765000 on 7 degrees of freedom
## Multiple R-squared:  0.06128,    Adjusted R-squared:  -0.07283 
## F-statistic: 0.4569 on 1 and 7 DF,  p-value: 0.5208
mod_rec <- lm(Recrrent_Expenditure ~ Population, data = Cleaned_TMA_Data)
summary(mod_rec)
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population, data = Cleaned_TMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -10603509  -1652861   -983931   2407651   8417481 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)  
## (Intercept) 13379529.869  6813765.355   1.964   0.0903 .
## Population         7.598       24.870   0.306   0.7689  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5744000 on 7 degrees of freedom
## Multiple R-squared:  0.01316,    Adjusted R-squared:  -0.1278 
## F-statistic: 0.09335 on 1 and 7 DF,  p-value: 0.7689
Cleaned_TMA_Data %>% 
  ggplot(aes(x = Population, y = Capital_Expenditure)) +
  geom_point()+
  geom_smooth(method = "lm", se = TRUE) + labs(x = "Population", y = "Capital Expenditure", title = "Linear Relationship Population and Capital Expenditure")+
   scale_y_continuous(labels = scales::comma)

Cleaned_TMA_Data %>%
  ggplot(aes(x = Population, y = Recrrent_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(x = "Population", y = "Recurrent Expenditure", title = "Linear Relationship Population and Recurrent Expenditure") +
  scale_y_continuous(labels = scales::comma)

From the linear regression results there is non significant linear relationship between Population both Expenditures. They both have very small R-squared values and high p-values indicating their poor model fit.

  • Checking Regression Assumptions
# Diagnostic Function
perform_diagnostics <- function(model, model_name) {
  # Residuals vs. Fitted
  plot1 <- ggplot(data = data.frame(residuals = residuals(model), fitted = fitted(model)),
                 aes(x = fitted, y = residuals)) +
    geom_point() +
    geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
    labs(title = paste("Residuals vs. Fitted (", model_name, ")"), x = "Fitted Values", y = "Residuals")

  # Histogram of Residuals
  plot2 <- ggplot(data = data.frame(residuals = residuals(model)), aes(x = residuals)) +
    geom_histogram(bins = 10, fill = "skyblue", color = "black") +
    labs(title = paste("Histogram of Residuals (", model_name, ")"), x = "Residuals")

  # Q-Q Plot of Residuals
  plot3 <- ggplot(data = data.frame(residuals = residuals(model)), aes(sample = residuals)) +
    geom_point(stat = "qq") +
    stat_qq_line() +
    labs(title = paste("Q-Q Plot of Residuals (", model_name, ")"))

  # Durbin-Watson Test
  dw_test <- dwtest(model)
  print(paste("Durbin-Watson Test (", model_name, "):"))
  print(dw_test)

  # Breusch-Pagan Test
  bp_test <- bptest(model)
  print(paste("Breusch-Pagan Test (", model_name, "):"))
  print(bp_test)

  # Print VIF (if applicable)
  if (length(coef(model)) > 2) { # Check for multiple predictors
    vif_result <- vif(model)
    print(paste("VIF (", model_name, "):"))
    print(vif_result)
  }

  # Arrange plots
  grid.arrange(plot1, plot2, plot3, nrow = 1)
}

#  Perform Diagnostics
# Capital Expenditure
perform_diagnostics(mod_cap, "Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Capital Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.8832, p-value = 0.2945
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Capital Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.26711, df = 1, p-value = 0.6053

# Recurrent Expenditure
perform_diagnostics(mod_rec, "Recurrent Expenditure Model")
## [1] "Durbin-Watson Test ( Recurrent Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.1903, p-value = 0.03879
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Recurrent Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.92089, df = 1, p-value = 0.3372

From the above tests the Recurrent Expenditure Model violates the autocorrelation assumption and the capital expenditure violates the normality regression assumption.

  • Transformations
# Log Transformation for Recurrent Expenditure 
log_rec_mod <- lm(log(Recrrent_Expenditure) ~ Population, data = Cleaned_TMA_Data)
summary(log_rec_mod)
## 
## Call:
## lm(formula = log(Recrrent_Expenditure) ~ Population, data = Cleaned_TMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.02458 -0.09075  0.02795  0.24488  0.53189 
## 
## Coefficients:
##                   Estimate     Std. Error t value    Pr(>|t|)    
## (Intercept) 16.47250443341  0.55699486589  29.574 0.000000013 ***
## Population   0.00000001531  0.00000203298   0.008       0.994    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4696 on 7 degrees of freedom
## Multiple R-squared:  8.104e-06,  Adjusted R-squared:  -0.1428 
## F-statistic: 5.673e-05 on 1 and 7 DF,  p-value: 0.9942
perform_diagnostics(log_rec_mod, "Log Recurrent Expenditure Model")
## [1] "Durbin-Watson Test ( Log Recurrent Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.0731, p-value = 0.02204
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Log Recurrent Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.7591, df = 1, p-value = 0.3836

log_cap_mod <- lm(log(Capital_Expenditure) ~ Population, data = Cleaned_TMA_Data)
summary(log_cap_mod)
## 
## Call:
## lm(formula = log(Capital_Expenditure) ~ Population, data = Cleaned_TMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.49193 -0.37141 -0.05009  0.39407  0.63900 
## 
## Coefficients:
##                 Estimate   Std. Error t value      Pr(>|t|)    
## (Intercept) 15.514347363  0.504419752  30.757 0.00000000991 ***
## Population   0.000001354  0.000001841   0.735         0.486    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4252 on 7 degrees of freedom
## Multiple R-squared:  0.07171,    Adjusted R-squared:  -0.06091 
## F-statistic: 0.5407 on 1 and 7 DF,  p-value: 0.486
perform_diagnostics(log_cap_mod, "Log capital Expenditure Model")
## [1] "Durbin-Watson Test ( Log capital Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.5551, p-value = 0.1373
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Log capital Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.15613, df = 1, p-value = 0.6927

Cleaned_TMA_Data$Ln_Population <- log(Cleaned_TMA_Data$Population)
Cleaned_TMA_Data$Ln_Capital_Expenditure <- log(Cleaned_TMA_Data$Capital_Expenditure)



  

ggplot(Cleaned_TMA_Data, aes(x = log(Population), y = log(Capital_Expenditure))) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE)+
  labs(title = "Log(Population) vs. Log(Capital Expenditure)",
       x = "Log(Population)", y = "Log(Capital Expenditure)")

  ggplot(Cleaned_TMA_Data, aes(x = log(Population), y = log(Recrrent_Expenditure))) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Log(Population) vs. Log(Recurrent Expenditure)",
       x = "Log(Population)", y = "Log(Recurrent Expenditure)")

#  Square root transformation for Capital Expenditure
sqrt_cap_mod <- lm(sqrt(Capital_Expenditure) ~ Population, data = Cleaned_TMA_Data)
summary(sqrt_cap_mod)
## 
## Call:
## lm(formula = sqrt(Capital_Expenditure) ~ Population, data = Cleaned_TMA_Data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -681.5 -497.2 -121.5  523.5 1050.7 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)  
## (Intercept) 2352.675117  735.947371   3.197   0.0151 *
## Population     0.001883    0.002686   0.701   0.5059  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 620.4 on 7 degrees of freedom
## Multiple R-squared:  0.0656, Adjusted R-squared:  -0.06788 
## F-statistic: 0.4915 on 1 and 7 DF,  p-value: 0.5059
perform_diagnostics(sqrt_cap_mod, "Square root Capital Expenditure Model")
## [1] "Durbin-Watson Test ( Square root Capital Expenditure Model ):"
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.7126, p-value = 0.205
## alternative hypothesis: true autocorrelation is greater than 0
## 
## [1] "Breusch-Pagan Test ( Square root Capital Expenditure Model ):"
## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 0.018825, df = 1, p-value = 0.8909

From the transformations both expenditures model are still non-significant and violate some of the assumptions.

  • Quadratic model
Cleaned_TMA_Data$Recrrent_Expenditure_squared <- Cleaned_TMA_Data$Recrrent_Expenditure^2

Cleaned_TMA_Data$Capital_Expenditure_squared <- Cleaned_TMA_Data$Capital_Expenditure^2

mod_quad <- lm(cbind(Capital_Expenditure, Recrrent_Expenditure) ~ Population + Population_Squared, data = Cleaned_TMA_Data)

# View the summary
summary(mod_quad)
## Response Capital_Expenditure :
## 
## Call:
## lm(formula = Capital_Expenditure ~ Population + Population_Squared, 
##     data = Cleaned_TMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2778188 -2565965 -1787984  2500686  7219878 
## 
## Coefficients:
##                            Estimate       Std. Error t value Pr(>|t|)
## (Intercept)        30981222.2575110 48075740.0244342   0.644    0.543
## Population             -207.2691431      410.7546535  -0.505    0.632
## Population_Squared        0.0004256        0.0008002   0.532    0.614
## 
## Residual standard error: 3974000 on 6 degrees of freedom
## Multiple R-squared:  0.1035, Adjusted R-squared:  -0.1953 
## F-statistic: 0.3465 on 2 and 6 DF,  p-value: 0.7204
## 
## 
## Response Recrrent_Expenditure :
## 
## Call:
## lm(formula = Recrrent_Expenditure ~ Population + Population_Squared, 
##     data = Cleaned_TMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -6717413 -2654785  -655915  2983799  6860831 
## 
## Coefficients:
##                           Estimate      Std. Error t value Pr(>|t|)
## (Intercept)        91602005.532075 67845678.733128   1.350    0.226
## Population             -663.374432      579.667172  -1.144    0.296
## Population_Squared        0.001308        0.001129   1.159    0.291
## 
## Residual standard error: 5609000 on 6 degrees of freedom
## Multiple R-squared:  0.1936, Adjusted R-squared:  -0.07525 
## F-statistic: 0.7201 on 2 and 6 DF,  p-value: 0.5245
#  Scatter Plots (Transformed Data)
ggplot(Cleaned_TMA_Data, aes(x = Population, y = Capital_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
  labs(x = "Population", y = "Capital Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Capital Expenditure") +
  scale_y_continuous(labels = comma)

ggplot(Cleaned_TMA_Data, aes(x = Population, y = Recrrent_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x + I(x^2), se = TRUE) +
  labs(x = "Population", y = "Recurrent Expenditure (Ghana Cedis)", title = "Quadratic Relationship between Population and Recurrent Expenditure") +
  scale_y_continuous(labels = comma)

Quadratic models results still show non-significant.

Therefore from the regression analysis above the relationship between population and both expenditures is not linear. Other models could not still capture their relationship with statistically significant results.

1.4 What is the relationship between revenue growth and infrastructure delivery (Model)

Using total revenue growth rate and infrastructure delivery (capital expenditure per capita).

# Descriptive statistics
Cleaned_TMA_Data %>% skim(Capital_Exp_Per_Capita)
Data summary
Name Piped data
Number of rows 9
Number of columns 85
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Capital_Exp_Per_Capita 0 1 34.25 16.48 16.37 23.81 25.78 49 58.96 ▇▇▁▂▅
Cleaned_TMA_Data %>% skim(TtRev_Growth_Rate)
Data summary
Name Piped data
Number of rows 9
Number of columns 85
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
TtRev_Growth_Rate 0 1 6.18 25.09 -53.42 7.9 13.15 20.68 30.05 ▂▁▂▆▇
# Histograms
ggplot(Cleaned_TMA_Data, aes(x = Capital_Exp_Per_Capita)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Capital expenditure per capita", x = "Capital expenditure per capita") +
  scale_x_continuous(labels = comma)

ggplot(Cleaned_TMA_Data, aes(x = TtRev_Growth_Rate)) +
  geom_histogram(bins = 10, fill = "dodgerblue", color = "black") +
  labs(title = "Distribution of Total Revenue Growth Rate", x = "Total revenue growth rate") +
  scale_x_continuous(labels = percent)

# Plotting Trends 

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
  geom_point(aes(y = TtRev_Growth_Rate, color = "Total Revenue Growth Rate")) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  geom_line(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
  geom_point(aes(y = Capital_Exp_Per_Capita, color = "Capital Expenditure Per Capita")) +
  labs(
    title = "Total Revenue Growth Rate vs. Capital Expenditure Per Capita",
    x = "Year",
    y = "Total Revenue Growth Rate (%)"  
  ) +
  scale_y_continuous(
    labels = percent_format(scale = 1),  
    sec.axis = sec_axis(~., name = "Capital Expenditure Per Capita")
  ) +
  scale_color_manual(
    values = c("Total Revenue Growth Rate" = "lightseagreen", "Capital Expenditure Per Capita" = "indianred"),
    name = "Type"
  ) +
  theme(axis.title.y.right = element_text(vjust = 2))

The histograms show an uneven distribution of Capital expenditure per capita.The trends plots show clear that the trend of Total revenue growth rate ( which experienced significant changes) is not directly linked to the trend of Capital expenditure per capita.

1.4.1 Regression results

mod5 <- lm(Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_TMA_Data)
summary(mod5)
## 
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ TtRev_Growth_Rate, data = Cleaned_TMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.862 -10.520  -8.464  14.772  24.719 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       34.259021   6.069954   5.644 0.000779 ***
## TtRev_Growth_Rate -0.001256   0.248289  -0.005 0.996105    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.62 on 7 degrees of freedom
## Multiple R-squared:  3.655e-06,  Adjusted R-squared:  -0.1429 
## F-statistic: 2.558e-05 on 1 and 7 DF,  p-value: 0.9961
ggplot(Cleaned_TMA_Data, aes(x = TtRev_Growth_Rate, y = Capital_Exp_Per_Capita)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE)+
  labs(title = "Revenue Growth vs. Capital Expenditure (Per Capita)",
       x = "Total Revenue Growth Rate (%)",
       y = "Capital Expenditure Per Capita")

The regression result show there no statistically significant relationship between total revenue growth rate and infrastructure delivery (capital expenditure per capita) with p-value ( 0.9961) is greater than 0.05 significance level. This means that changes in revenue growth do not significantly predict changes in capital expenditure per capita in this model. The R-squared (3.655e-06) indicates almost 0% of the variation in capital expenditure per capita can be explained by revenue growth (total revenue growth rate)

1.5 What is the relationship between expenditure growth and infrastructure delivery?

  • Regression results using expenditure growth (Expenditure_Growth) and infrastructure delivery (capital expenditure per capita).
Cleaned_TMA_Data$Expenditure_Growth <- c(NA, diff(Cleaned_TMA_Data$Total_Expenditure) / Cleaned_TMA_Data$Total_Expenditure[-nrow(Cleaned_TMA_Data)]) * 100

mod6 <- lm(Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_TMA_Data)
  summary(mod6)
## 
## Call:
## lm(formula = Capital_Exp_Per_Capita ~ Expenditure_Growth, data = Cleaned_TMA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.031 -11.888  -7.608  15.093  21.471 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)   
## (Intercept)        37.63881    6.61269   5.692  0.00127 **
## Expenditure_Growth -0.06207    0.14345  -0.433  0.68035   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.12 on 6 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.03026,    Adjusted R-squared:  -0.1314 
## F-statistic: 0.1872 on 1 and 6 DF,  p-value: 0.6803
  ggplot(Cleaned_TMA_Data, aes(x = Expenditure_Growth, y = Capital_Exp_Per_Capita)) +
    geom_point() + geom_smooth(method = "lm", se = TRUE)+
    labs(title = "Expenditure Growth vs. Capital Expenditure (Per Capita)",
         x = "Expenditure Growth Rate (%)",
         y = "Capital Expenditure Per Capita")

From the linear regression results there is no statistically significant relationship.

2 SHEET 2

2.1 What is the relationship between allocative and funding decision-making and revenue patterns?

# no variables

2.2 What is the relationship between allocative decision-making and expenditure patterns?

  • No direct variables are available on this, some descriptive statistics of closely related are below
# Expenditure Composition:
Cleaned_TMA_Data$CapExp_Pct <- (Cleaned_TMA_Data$Capital_Expenditure / Cleaned_TMA_Data$Total_Expenditure) 
Cleaned_TMA_Data$CapExp_Rev_Ratio <- (Cleaned_TMA_Data$Capital_Expenditure / Cleaned_TMA_Data$Total_Revenue)



# Expenditure Composition 
ggplot(Cleaned_TMA_Data, aes(x = Year, y = CapExp_Pct)) +
  geom_bar(stat = "identity", fill = "dodgerblue") +
  geom_point()+
  labs(title = "Capital Expenditure as Percentage of Total Expenditure",
       x = "Year",
       y = "Percentage") +
  scale_y_continuous(labels = percent_format(accuracy = 1))

# Trends of Revenue and Expenditure over the years.

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue, color = "Total Revenue")) +
  geom_point(aes(y = Total_Revenue)) +  # Added aes(y = Total_Revenue)
  geom_line(aes(y = Total_Expenditure, color = "Total Expenditure")) +
  geom_point(aes(y = Total_Expenditure)) +  # Added aes(y = Total_Expenditure)
  labs(title = "Revenue and Expenditure Trends Over Years",
       x = "Year",
       y = "Amount (Ghana Cedis)", color = "Type") +
  scale_color_manual(values = c("Total Revenue" = "blue", "Total Expenditure" = "red")) +
  scale_y_continuous(labels = comma) 

ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
  geom_line(aes(y = IGF, color = "IGF"), size = 1) +
  geom_line(aes(y = DACF, color = "DACF"), size = 1) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
    geom_line(aes(y = Recrrent_Expenditure , color = "Recurrent Expenditure"), size = 1) +
  geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
  geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
  labs(
    title = "Revenue and Expenditure Trends",
    x = "Year",
    y = "Amount (Ghana Cedis)",
    color = "Type"
  ) +
  scale_color_manual(
    values = c(
      "Total Revenue" = "blue",
      "Other Sources" = "skyblue",
      "IGF" = "green",
      "DACF" = "darkgray",
      "Capital Expenditure" = "purple",
      "Total Expenditure" = "red",
      "Recurrent Expenditure" = "yellow"
    )
  ) +
  scale_y_continuous(labels = scales::comma) +
  theme(
    legend.position = "right", 
    legend.title = element_text(face = "bold"), 
    plot.title = element_text(hjust = 0.5, face = "bold") 
  )

# IGF to Total Expenditure Ratio 
ggplot(Cleaned_TMA_Data, aes(x = Year, y = IGF_TE)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "IGF to Total Expenditure Ratio Over Years",
    x = "Year",
    y = "Ratio (IGF/Total Expenditure)"
  ) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) 

# CapExp_Rev_Ratio plot.
ggplot(Cleaned_TMA_Data, aes(x = Year, y = CapExp_Rev_Ratio)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "Capital Expenditure to Total Revenue Ratio Over Years",
    x = "Year",
    y = "Ratio (Capital Expenditure/Total Revenue)"
  ) +
  scale_y_continuous(labels = comma) 

cor.test(Cleaned_TMA_Data$Total_Expenditure, Cleaned_TMA_Data$Total_Revenue)
## 
##  Pearson's product-moment correlation
## 
## data:  Cleaned_TMA_Data$Total_Expenditure and Cleaned_TMA_Data$Total_Revenue
## t = 4.6749, df = 7, p-value = 0.002274
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4885260 0.9723908
## sample estimates:
##       cor 
## 0.8702901

In the above plots, the Capital Expenditure as Percentage of Total Expenditure shows a slightly high capital investment with peak around 2014, followed by a sustained decline after 2016. Also, there is strong correlation between Total Revenue and Total Expenditure, with both peaking around 2016 and fall afterwards.

2.3 What is the relationship between population trend, service delivery and revenue and expenditure patterns?

# Revenue Per Capita
Cleaned_TMA_Data$Total_Revenue_Per_Capita <- Cleaned_TMA_Data$Total_Revenue / Cleaned_TMA_Data$Population
Cleaned_TMA_Data$IGF_Per_Capita <- Cleaned_TMA_Data$IGF / Cleaned_TMA_Data$Population
Cleaned_TMA_Data$DACF_Per_Capita <- Cleaned_TMA_Data$DACF / Cleaned_TMA_Data$Population

# Time Series Plots (Improved)

# Total Revenue and Expenditure Trends
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue, color = "Total Revenue"), size = 1) +
  geom_point(aes(y = Total_Revenue, color = "Total Revenue")) +
  geom_line(aes(y = IGF, color = "IGF"), size = 1) +
  geom_point(aes(y = IGF, color = "IGF")) +
  geom_line(aes(y = DACF, color = "DACF"), size = 1) +
  geom_point(aes(y = DACF, color = "DACF")) +
  geom_line(aes(y = Capital_Expenditure, color = "Capital Expenditure"), size = 1) +
  geom_line(aes(y = Recrrent_Expenditure , color = "Recurrent Expenditure"), size = 1) +
  geom_point(aes(y = Capital_Expenditure, color = "Capital Expenditure")) +
  geom_line(aes(y = Total_Expenditure, color = "Total Expenditure"), size = 1) +
  geom_point(aes(y = Total_Expenditure, color = "Total Expenditure")) +
  geom_line(aes(y = Others_Sources, color = "Other Sources"), size = 1) +
  geom_point(aes(y = Others_Sources, color = "Other Sources")) +
  labs(
    title = "Revenue and Expenditure Trends Over Years",
    x = "Year",
    y = "Amount (Ghana Cedis)",
    color = "Type"
  ) +
  scale_color_manual(
    values = c(
      "Total Revenue" = "blue",
      "Other Sources" = "skyblue",
      "IGF" = "green",
      "DACF" = "darkgray",
      "Capital Expenditure" = "purple",
      "Total Expenditure" = "red",
      "Recurrent Expenditure" = "yellow"
    )
  ) +
  scale_y_continuous(labels = comma) +
  theme(
    legend.position = "right",
    legend.title = element_text(face = "bold"),
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

# Population Trend
ggplot(Cleaned_TMA_Data, aes(x = Year, y = Population)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "Population Trend Over Years",
    x = "Year",
    y = "Population"
  ) +
  scale_y_continuous(labels = comma) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.title = element_text(face = "bold")
  )

# IGF to Total Expenditure Ratio
ggplot(Cleaned_TMA_Data, aes(x = Year, y = IGF_TE)) +
  geom_line(color = "steelblue", size = 1) +
  geom_point(size = 2.5) +
  labs(
    title = "IGF to Total Expenditure Ratio Over Years",
    x = "Year",
    y = "Ratio (IGF/Total Expenditure)"
  ) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.title = element_text(face = "bold")
  )

# Per capita plot
ggplot(Cleaned_TMA_Data, aes(x = Year)) +
  geom_line(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
  geom_point(aes(y = Total_Revenue_Per_Capita, color = "Total Revenue Per Capita")) +
  geom_line(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
  geom_point(aes(y = IGF_Per_Capita, color = "IGF Per Capita")) +
  geom_line(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
  geom_point(aes(y = DACF_Per_Capita, color = "DACF Per Capita")) +
  labs(title = "Revenue Per Capita trends", x = "Year", y = "Amount (Ghana Cedis)", color = "Type") +
  scale_y_continuous(labels = comma) 

cor_matrix <- cor(Cleaned_TMA_Data[, c("Population", "Total_Revenue", "Total_Expenditure", "IGF_TE", "CapExp_Pct", "IGF")], use = "complete.obs")
print(cor_matrix)
##                    Population Total_Revenue Total_Expenditure      IGF_TE
## Population         1.00000000   -0.01654634       -0.03275889 -0.11415765
## Total_Revenue     -0.01654634    1.00000000        0.87029014 -0.38486443
## Total_Expenditure -0.03275889    0.87029014        1.00000000 -0.77523076
## IGF_TE            -0.11415765   -0.38486443       -0.77523076  1.00000000
## CapExp_Pct         0.33314280   -0.41553743       -0.27870408 -0.06141442
## IGF               -0.24944527    0.94340185        0.82900480 -0.31458341
##                    CapExp_Pct        IGF
## Population         0.33314280 -0.2494453
## Total_Revenue     -0.41553743  0.9434019
## Total_Expenditure -0.27870408  0.8290048
## IGF_TE            -0.06141442 -0.3145834
## CapExp_Pct         1.00000000 -0.5982153
## IGF               -0.59821527  1.0000000
corrplot(cor_matrix, main = "Correlation matrix of population and expenditure patterns")

In the above there is a strong positive correlation between total revenue and total expenditure and also between IGF.

2.3.1 Regression Analysis

# Total Revenue vs Population
model_revenue_pop <- lm(Total_Revenue ~ Population, data = Cleaned_TMA_Data)
summary(model_revenue_pop)
## 
## Call:
## lm(formula = Total_Revenue ~ Population, data = Cleaned_TMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -12409733  -4647150  -1125595   6134998  10886395 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)   
## (Intercept) 36772413.660  9208032.446   3.994  0.00523 **
## Population        -1.471       33.608  -0.044  0.96630   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7763000 on 7 degrees of freedom
## Multiple R-squared:  0.0002738,  Adjusted R-squared:  -0.1425 
## F-statistic: 0.001917 on 1 and 7 DF,  p-value: 0.9663
# Total Expenditure vs Population
model_expenditure_pop <- lm(Total_Expenditure ~ Population, data = Cleaned_TMA_Data)
summary(model_expenditure_pop)
## 
## Call:
## lm(formula = Total_Expenditure ~ Population, data = Cleaned_TMA_Data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -19725491   -828997    260211   5016575   8886541 
## 
## Coefficients:
##                 Estimate   Std. Error t value Pr(>|t|)  
## (Intercept) 36033170.438 10690119.426   3.371   0.0119 *
## Population        -3.384       39.018  -0.087   0.9333  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9012000 on 7 degrees of freedom
## Multiple R-squared:  0.001073,   Adjusted R-squared:  -0.1416 
## F-statistic: 0.00752 on 1 and 7 DF,  p-value: 0.9333
# Capital Expenditure vs Total Revenue and IGF_TE
model_capital_rev_igf <- lm(Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_TMA_Data)
summary(model_capital_rev_igf)
## 
## Call:
## lm(formula = Capital_Expenditure ~ Total_Revenue + IGF_TE, data = Cleaned_TMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3537240 -1751092  -268196  1891113  5421506 
## 
## Coefficients:
##                       Estimate       Std. Error t value Pr(>|t|)  
## (Intercept)    21628659.817864  10620833.425263   2.036   0.0879 .
## Total_Revenue        -0.001251         0.166573  -0.008   0.9943  
## IGF_TE        -20410550.193464  10294834.110031  -1.983   0.0947 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3158000 on 6 degrees of freedom
## Multiple R-squared:  0.434,  Adjusted R-squared:  0.2453 
## F-statistic:   2.3 on 2 and 6 DF,  p-value: 0.1813
# IGF_TE vs Population and Total Revenue
model_igfte_pop_rev <- lm(IGF_TE ~ Population + Total_Revenue, data = Cleaned_TMA_Data)
summary(model_igfte_pop_rev)
## 
## Call:
## lm(formula = IGF_TE ~ Population + Total_Revenue, data = Cleaned_TMA_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.13417 -0.06928 -0.01362  0.08779  0.18710 
## 
## Coefficients:
##                      Estimate      Std. Error t value Pr(>|t|)  
## (Intercept)    0.917775277753  0.266663923178   3.442   0.0138 *
## Population    -0.000000173477  0.000000537626  -0.323   0.7579  
## Total_Revenue -0.000000006259  0.000000006045  -1.035   0.3404  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1242 on 6 degrees of freedom
## Multiple R-squared:  0.1627, Adjusted R-squared:  -0.1165 
## F-statistic: 0.5827 on 2 and 6 DF,  p-value: 0.5871
#  Visualizations

# Scatter plot: Total Revenue vs Population
ggplot(Cleaned_TMA_Data, aes(x = Population, y = Total_Revenue)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Total Revenue vs Population", x = "Population", y = "Total Revenue") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

# Scatter plot: Total Expenditure vs Population
ggplot(Cleaned_TMA_Data, aes(x = Population, y = Total_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Total Expenditure vs Population", x = "Population", y = "Total Expenditure") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

# Scatter plot: Capital Expenditure vs Total Revenue
ggplot(Cleaned_TMA_Data, aes(x = Total_Revenue, y = Capital_Expenditure)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Capital Expenditure vs Total Revenue", x = "Total Revenue", y = "Capital Expenditure") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma)

# Scatter plot: IGF_TE vs Population
ggplot(Cleaned_TMA_Data, aes(x = Population, y = IGF_TE)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "IGF_TE vs Population", x = "Population", y = "IGF_TE") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

ggplot(Cleaned_TMA_Data, aes(x = Total_Revenue, y = IGF_TE)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "IGF_TE vs Total Revenue", x = "Total Revenue", y = "IGF_TE") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = percent_format(accuracy = 1))

In the regression results above, we found a insignificant linear relationship between between Total Revenue and Population, Total Expenditure and Population, and Capital Expenditure, Total Revenue, and between IGF_TE vs Population and Total Revenue.

2.4 What is the relationship between service delivery and revenue and expenditure patterns?

# no variables

2.5 SHEET 3